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Abstract 

Benchmarking  of  prognostic  algorithms  has  been  challeng¬ 
ing  due  to  limited  availability  of  common  datasets  suit¬ 
able  for  prognostics.  In  an  attempt  to  alleviate  this  prob¬ 
lem,  several  benchmarking  datasets  have  been  collected  by 
NASA’s  prognostic  center  of  excellence  and  made  available 
to  the  Prognostics  and  Health  Management  (PHM)  commu¬ 
nity  to  allow  evaluation  and  comparison  of  prognostics  algo¬ 
rithms.  Among  those  datasets  are  five  C-MAPSS  datasets  that 
have  been  extremely  popular  due  to  their  unique  characteris¬ 
tics  making  them  suitable  for  prognostics.  The  C-MAPSS 
datasets  pose  several  challenges  that  have  been  tackled  by 
different  methods  in  the  PHM  literature.  In  particular,  man¬ 
agement  of  high  variability  due  to  sensor  noise,  effects  of 
operating  conditions,  and  presence  of  multiple  simultaneous 
fault  modes  are  some  factors  that  have  great  impact  on  the 
generalization  capabilities  of  prognostics  algorithms.  More 
than  70  publications  have  used  the  C-MAPSS  datasets  for  de¬ 
veloping  data-driven  prognostic  algorithms.  The  C-MAPSS 
datasets  are  also  shown  to  be  well-suited  for  development  of 
new  machine  learning  and  pattern  recognition  tools  for  sev¬ 
eral  key  preprocessing  steps  such  as  feature  extraction  and 
selection,  failure  mode  assessment,  operating  conditions  as¬ 
sessment,  health  status  estimation,  uncertainty  management, 
and  prognostics  performance  evaluation.  This  paper  summa¬ 
rizes  a  comprehensive  literature  review  of  publications  using 
C-MAPSS  datasets  and  provides  guidelines  and  references  to 
further  usage  of  these  datasets  in  a  manner  that  allows  clear 
and  consistent  comparison  between  different  approaches. 

1.  Introduction 

In  the  past  decade,  the  science  of  prognostics  has  fairly  ma¬ 
tured  and  the  general  understanding  of  health  prediction  prob- 
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lem  and  its  applications  has  greatly  improved.  Both  data- 
driven  and  physics  based  methods  have  been  shown  to  pos¬ 
sess  unique  advantages  that  are  specific  to  application  con¬ 
texts.  However,  until  very  recently,  a  common  bottleneck  in 
development  of  data-driven  methods  was  the  lack  of  availabil¬ 
ity  of  run-to-failure  data  sets.  In  most  real-world  cases,  data 
contain  fault  signatures  for  a  growing  fault  at  various  sever¬ 
ity  levels  but  no  or  little  data  capture  fault  evolution  all  the 
way  through  failure.  Procuring  actual  system  fault  progres¬ 
sion  data  is  typically  time  consuming  and  expensive.  Fielded 
systems  are,  most  of  the  time,  not  properly  instrumented  for 
collection  of  relevant  data  or  are  unable  to  distribute  such  data 
due  to  proprietary  constraints.  The  lack  of  common  data  sets, 
which  researchers  can  use  to  compare  their  approaches,  has 
been  an  impediment  to  progress  in  the  field  of  prognostics.  To 
tackle  this  problem,  a  prognostics  data  repository  was  estab¬ 
lished  (Saxena  &  Goebel,  2008).  Several  datasets  have  been 
since  published  that  have  been  used  by  researchers  around  the 
world.  Among  these  datasets  are  five  datasets  from  a  turbo¬ 
fan  engine  simulation  model  -  C-MAPSS  (Commercial  Mod¬ 
ular  Aero-Propulsion  System  Simulation)  (Frederick,  DeCas- 
tro,  &  Fitt,  2007).  By  simulating  a  variety  of  operational 
conditions  and  injecting  faults  of  varying  degree  of  degra¬ 
dation,  datasets  were  generated  for  prognostics  development 
(Saxena,  Goebel,  Simon,  &  Eklund,  2008a).  One  of  the 
first  datasets  was  used  for  a  prognostics  data  challenge  at 
the  PHM’ 08  conference.  A  subsequent  set  was  then  released 
later  with  varying  degrees  of  complexity.  These  datasets  have 
since  been  used  very  widely  in  publications  for  benchmarking 
prognostics  algorithms. 

The  turbofan  degradation  datasets  have  received  over  seven 
thousand  unique  downloads  in  the  last  five  years  but  algo¬ 
rithms  developed  using  these  have  been  published  in  only 
about  seventy  publications.  Furthermore,  in  many  publica¬ 
tions  it  is  not  clear  how  authors  are  computing  results  and 
comparing  with  others.  There  has  been  a  confusion  and  in¬ 
consistency  in  how  these  datasets  have  been  interpreted  and 
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used  in  many  cases.  Consequently,  not  all  comparisons  of 
performance  can  be  considered  valid.  Therefore,  this  paper 
intends  to  analyze  various  approaches  that  researchers  have 
taken  to  implement  prognostics  using  these  turbofan  datasets. 
Some  unique  characteristics  of  these  datasets  are  also  identi¬ 
fied  that  led  to  use  of  certain  methods  more  often  than  oth¬ 
ers.  Specifically,  various  differences  among  these  datasets 
are  pointed  out.  A  commentary  is  provided  on  how  these  ap¬ 
proaches  fared  compared  to  the  winners  of  the  data  challenge. 
Furthermore,  this  paper  also  attempts  to  clear  several  issues 
so  that  researchers,  in  the  future,  can  take  these  factors  into 
account  in  comparing  their  approaches  with  the  benchmarks. 

The  paper  is  organised  as  follows.  In  Section  2,  the  C- 
MAPSS  datasets  are  presented.  Section  3  is  dedicated  to  the 
literature  review.  Section  4  presents  a  taxonomy  of  prognos¬ 
tics  approaches  for  C-MAPSS  datasets.  Finally,  Section  5 
provides  some  guidelines  to  give  a  hand  to  future  users  in  de¬ 
veloping  new  prognostic  algorithms  applied  to  these  datasets 
and  in  facilitating  algorithms  benchmarking. 

2.  C-MAPSS  Datasets 

C-MAPSS  is  a  tool,  coded  in  the  MATLAB-Simulink  (5)  en¬ 
vironment  for  simulating  engine  model  of  the  90,000  lb  thrust 
class  (Frederick  et  ah,  2007).  Using  a  number  of  editable 
input  parameters  it  is  possible  to  specify  operational  profile, 
closed-loop  controllers,  environmental  conditions  (various  al¬ 
titudes  and  temperatures),  etc.  Additionally,  there  are  provi¬ 
sions  to  modify  some  efficiency  parameters  to  simulate  vari¬ 
ous  degradations  in  different  sections  of  the  engine  system. 

2.1.  Datasets  characteristics 

Using  this  simulation  environment,  five  datasets  were  gen¬ 
erated.  By  creating  a  custom  code  wrapper,  as  described 
in  (Saxena,  Goebel,  et  ah,  2008a),  selected  fault  injection 
parameters  were  varied  to  simulate  continuous  degradation 
trends.  Data  from  various  parts  of  the  system  were  collected 
to  record  effects  of  degradations  on  sensor  measurements  and 
provide  time  series  exhibiting  degradation  behaviors  in  mul¬ 
tiple  units.  These  datasets  possess  unique  characteristics  that 
make  them  very  useful  and  suitable  for  developing  prognostic 
algorithms. 

1.  Data  represent  a  multi-dimensional  response  from  a 
complex  non-linear  system  from  a  high  fidelity  simula¬ 
tion  that  very  closely  models  a  real  system. 

2.  These  simulations  incorporated  high  levels  of  noise  in¬ 
troduced  at  various  stages  to  accommodate  the  nature  of 
variability  generally  encountered. 

3.  The  effects  of  faults  are  masked  due  to  operational  con¬ 
ditions,  which  is  yet  another  common  trait  of  most  oper¬ 
ational  systems. 

4.  Data  from  plenty  of  units  is  provided  to  allow  algorithms 


to  extract  trends  and  build  associations  for  learning  sys¬ 
tem  behavior  useful  for  predicting  RUL. 

These  datasets  were  geared  towards  data-driven  approaches 
where  very  little  or  no  system  information  was  made  available 
to  PHM  developers. 

As  described  in  detail  in  Section  3,  the  analysis  on  the  publi¬ 
cations  using  these  datasets  shows  that  many  researchers  have 
tried  to  make  comparisons  between  results  obtained  from 
these  similar  yet  different  datasets.  This  section  briefly  de¬ 
scribes  and  distinguishes  the  five  datasets  and  explains  why 
it  may  or  may  not  be  appropriate  to  make  such  comparisons. 
Table  1  summarizes  the  five  datasets.  The  fundamental  dif¬ 
ference  between  these  datasets  is  attributed  to  the  number  of 
simultaneous  fault  modes  and  the  operational  conditions  sim¬ 
ulated  in  these  experiments.  Datasets  #1  through  incor¬ 
porate  an  increasing  level  of  complexity  and  may  be  used  to 
incrementally  learn  the  effects  of  faults  and  operational  con¬ 
ditions.  Furthermore,  what  sets  these  four  datasets  apart  from 
the  challenge  datasets  is  the  availability  of  ground  truth  to 
measure  performance.  Datasets  1  —  4  consist  of  a  training 
set  that  users  can  use  to  train  their  algorithms  and  a  test  set 
to  test  the  algorithms.  The  ground  truth  RUL  values  for  the 
test  set  are  also  given  to  assess  prediction  errors  and  compute 
any  metrics  for  comparison  purposes.  Results  between  these 
datasets  may  not  always  be  comparable  as  these  data  simulate 
different  levels  of  complexity,  unless  a  universal  generalized 
model  is  available  that  regards  datasets  1  —  3  as  special  cases 
of  dataset  #4. 

The  PHM  challenge  datasets  are  designed  in  a  slightly  differ¬ 
ent  way  and  divided  into  three  parts.  Dataset  #5T  contains 
a  train  set  and  test  set  just  like  for  datasets  1  —  4  except  with 
one  difference.  The  ground  truth  RUL  for  the  test  set  are 
not  revealed.  The  challenge  participants  were  asked  to  up¬ 
load  their  results  (only  once  per  day)  to  receive  a  score  based 
on  an  asymmetrical  scoring  function  (see  (Saxena,  Goebel,  et 
ah,  2008a)).  Users  can  still  get  their  results  evaluated  using 
the  same  scoring  function  by  uploading  their  results  on  the 
repository  page,  but  otherwise  it  is  not  possible  to  compute 
any  other  metric  on  the  results  in  absence  of  ground  truth  to 
allow  error  computation.  The  third  part  of  the  challenge  set  is 
dataset  #5U,  the  final  validation  set  that  was  used  to  rank  the 
challenge  participants,  where  they  were  allowed  only  once 
chance  to  submit  their  results.  The  challenge  since  then  is  still 
continuing  and  a  participant  may  submit  final  results  (only 
once)  for  evaluation  per  instructions  posted  with  the  dataset 
on  the  NASA  repository  (Saxena  &  Goebel,  2008). 

2.2.  Performance  Benchmarking 

One  of  the  key  drivers  for  this  study  was  to  assess  state-of- 
the-art  in  prognostic  methods  established  through  compar¬ 
isons  and  performance  benchmarking.  However,  the  survey 
revealed  a  serious  lack  of  consistency  in  methods  used  for 
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Table  1 .  Description  of  the  five  turbofan  degradation  datasets  available  from  NASA  repository. 


Datasets 

#Fault  Modes 

#  Conditions 

#Train  Units 

#Test  Units 

#i 

1 

i 

100 

100 

Turbofan  data 
from  NASA 

#2 

1 

6 

260 

259 

repository 

#3 

2 

1 

100 

100 

#4 

2 

6 

249 

248 

PHM2008  Data 

#5  T 

1 

6 

218 

218 

Challenge 

#5Vr 

1 

6 

218 

435 

performance  evaluation.  One  of  the  key  contributing  reasons 
towards  this  inconsistency  is  thought  to  be  the  unavailabil¬ 
ity  of  established  performance  banchmark.  Originally  it  was 
planned  that  the  PHM08  challenge  winning  performances 
would  establish  a  benchmark  that  would  allow  further  im¬ 
provements  as  new  methods  are  developed.  But  since  that 
webpage  was  taken  down  in  subsequent  years  these  scores 
have  not  been  easily  available  except  as  reported  (often  par¬ 
tially)  in  some  publications  from  the  winners.  It  is,  therefore, 
planned  to  compute  several  relevant  metrics  on  the  submitted 
results  during  PHM08  challenge  and  make  them  available  to 
serve  as  reference  for  future  efforts.  These  benchmarks,  how¬ 
ever,  remain  beyond  the  scope  of  this  paper  and  will  be  made 
available  in  future  publications. 

3.  C-MAPSS  Dataset  Literature  Review 

To  analyze  various  approaches  that  have  been  used  to  solve 
C-MAPSS  dataset  problem,  all  the  publications  that  cite  these 
datasets  including  the  references  recommended  by  the  repos¬ 
itory  were  collected  through  standard  web  search.  The  search 
results  returned  over  seventy  publications  which  were  then 
preprocessed  to  identify  overlapping  efforts  by  same  authors 
or  the  publications  that  only  cite  the  dataset  but  perceivably 
did  not  use  them  for  algorithm  development.  This  resulted 
in  forty  unique  publications  that  were  then  considered  for  re¬ 
view  and  analysis  in  this  work. 

For  the  sake  of  readability,  each  of  these  publications  were  as¬ 
signed  a  unique  ID  to  use  in  various  tables  summarizing  the 
results  presented  in  this  section.  This  mapping  between  pub¬ 
lication  and  IDs  is  presented  in  Table  10  as  appendix.  Fur¬ 
thermore,  to  keep  the  paper  length  short,  a  detailed  review 
analysis  of  each  of  the  forty  publications  is  not  included  but 
only  the  summarized  findings. 

The  analysis  of  the  collected  publications  reveals  several  im¬ 
portant  observations  that  are  summarized  here.  First,  these 
publications  are  binned  into  various  different  categories  and 
then  analyzed  for  the  distributions  thus  observed.  These  cat¬ 
egories  and  corresponding  findings  are  presented  next. 

3.1.  C-MAPSS  Dataset  Used 

Table  2  identifies  specific  publications  that  use  one  or  more 
of  these  five  datasets.  It  can  be  observed  that  the  dataset  #1 


was  the  most  used  one  (55%),  followed  by  the  test  set  (#5T) 
from  the  PHM08  challenge  (35%),  whereas  rest  of  the  other 
datasets  are  relatively  under  utilized.  Three  publications  re¬ 
port  generating  their  own  datasets  using  the  C-MAPSS  sim¬ 
ulator  and  (Richter,  2012)  describes  the  simulator  and  how 
it  can  be  used  to  generate  degradation  data  rather  than  using 
any  specific  dataset. 

The  heavy  usage  of  the  dataset  #1  (ss  70%)  compared  to 
all  other  datasets  among  the  four  from  the  NASA  Repository 
may  be  attributed  to  its  apparent  simplicity  compared  to  the 
rest  because  some  of  the  sensor  measurements  in  this  dataset 
depict  a  monotonic  trend.  This  may  lead  to  a  possible  con¬ 
fusion  with  health  indicators.  High  usage  of  dataset  #5T  is 
attributed  to  the  PHM08  challenge,  where  several  teams  had 
already  used  these  data  extensively,  thereby  gaining  signifi¬ 
cant  familiarity  with  the  dataset  as  well  as  a  preference  due  to 
availability  of  corresponding  benchmark  performance  from 
the  challenge  leader  board. 


Table  2.  List  of  publications  for  each  dataset. 


Datasets 

Publication  ID 

Ratio 

Turbofan  data 

#1 

5,  6,  10,  13,  14,  15.  19,  20, 
23,  24,  25,  26,  27,  28,  31, 
32,  33,  34,  36,  37,  38,  40 

22/40 

from  NASA 
repository 

#2 

13,  22,  34,  40 

4/40 

#3 

34,  40 

2/40 

#4 

7,  34,  40 

3/40 

PHM08  Data 
challenge 

#5  T 

1,  2,  3,  4,  8,  12,  16,  17,  21, 
29,  30,  34,  35,  40 

14/40 

#5V 

1,2,3,40 

4/40 

Simulator 

OWN 

9,11,39 

3/40 

Other 

- 

18 

1/40 

Several  publications  mentioned  in  Table  2  have  used  only 
the  training  datasets  that  have  complete  (run-to-failure)  tra¬ 
jectories.  Using  data  with  complete  trajectories  gives  access 
to  the  true  End-of-Life  (EOL)  to  compute  RUL  from  any 
time  point  in  a  degradation  trajectory  which  could  be  used 
to  generate  a  larger  set  of  training  data.  This  approach  is 
also  relevant  to  estimating  RUL  at  different  time  points  and 
allows  the  usage  of  prognostics  metrics  (Saxena,  Celaya,  et 
ah,  2008)  such  as  Prognostic  Horizon,  a  —  A  metric,  or  the 
convergence  measure.  However,  in  true  learning  sense  the 
algorithm,  once  trained,  must  be  tested  on  unseen  data  for 
proper  validation,  as  was  required  for  the  PHM’08  challenge 


614 


Annual  Conference  of  the  Prognostics  and  Health  Management  Society  2014 


datasets.  Table  3  shows  that  11  different  publications  used  the 
full  training/testing  datasets:  the  training  dataset  for  estimat¬ 
ing  the  parameters  of  the  algorithms  and  using  the  full  testing 
datasets  for  performance  evaluation. 

Table  3.  List  of  publications  using  only  full  training/testing 
datasets. 


Datasets 

Publication  ID 

Ratio 

Turbofan  dataset 
from  NASA 
repository 

20,  27,  28,  40 

5/40 

#2 

40 

1/40 

#3 

40 

1/40 

#4 

40 

1/40 

PHM08  Data 

#5  T 

1,2,  3,4,  16,21,40 

7/40 

challenge 

#5V 

1,  2,  3,  40 

4/40 

3.2.  Target  Problem  Being  Solved 

As  normally  expected  there  is  a  wide  variety  of  approaches 
taken  in  interpreting  the  datasets,  formulating  a  problem,  and 
modeling  the  system  to  solve  the  problem.  However,  contrary 
to  expectations  a  significant  number  of  publications  have  uti¬ 
lized  these  datasets  for  analysis  heavily  focused  on  diagnosis 
(multi-class  classification)  rather  than  prognostics. 

By  posing  a  multi-class  classification  problem  various  publi¬ 
cations  attempt  to  solve  mainly  three  types  of  problems: 

•  Supervised  classification:  The  training  dataset  is  labeled 
(known  classes  for  each  feature  vector); 

•  Unsupervised  classification:  The  classes  are  not  known 
apriori  and  data  are  not  labeled; 

•  Partially  supervised  classification:  Some  classes  are  pre¬ 
cisely  known,  others  are  unknown  or  are  attached  with  a 
confidence  value  to  express  belief  in  that  class. 

Publications  1,  7,  10,  20,  24,  27,  32  use  classification  for 
preprocessing  steps  towards  solving  a  prognostics  problem. 
Specifically,  unsupervised  classification  algorithms  are  used 
in  publications  1,  7  to  segment  the  dataset  into  the  six  oper¬ 
ating  conditions.  For  reference,  detailed  information  about 
various  simulated  operating  conditions  in  C-MAPSS  is  de¬ 
scribed  in  (Richter,  2012),  which  can  also  be  used  to  label 
these  datasets.  Supervised  and  unsupervised  classification  al¬ 
gorithms  are  also  used  in  publications  6,  10,  20,  27,  32  to 
assign  a  degradation  level  according  to  sensor  measurements. 
The  sequence  of  discrete  failure  degradation  stages  is  indeed 
relevant  for  the  estimation  of  the  current  health  state  and  its 
prediction  (Kim,  2010). 

Health  assessment,  anomaly  detection  (seen  as  a  1 -class  clas¬ 
sification  problem)  or  fault  identification  are  tackled  in  pub¬ 
lications  6,  11,  12,  13,  26,  31,  35  using  supervised  classifi¬ 
cation  methods,  and  partially  supervised  classification  tech¬ 
niques  in  publications  12,  27,  33.  For  these  approaches,  a 
known  target  (or  a  degradation  level)  is  required  to  evaluate 
the  classification  rate.  For  instance,  four  degradation  levels 


were  defined  for  labeling  data  in  publications  6,  10,  27,  33: 
normal  degradation  (class  1),  knee  corresponding  to  a  notice¬ 
able  degradation  (class  2  viewed  as  a  transition  between  class 
1  and  3),  accelerated  degradation  (class  3)  and  failure  (class 
4).  One  such  segmentation  is  provided  at  URL1,  whereas 
a  different  set  of  segmentation  was  proposed  in  publication 
13.  Using  these  segmented  data  (clusters)  as  proxy  to  ground 
truth,  some  level  of  classification  performance  can  be  evalu¬ 
ated  for  comparison  purposes. 

Similar  to  several  classification  approaches  used,  many  ap¬ 
proaches  were  employed  for  solving  the  prognostics  problem 
for  predicting  RUL.  In  order  to  give  due  attention  to  the  anal¬ 
ysis  of  prognostic  methods,  a  discussion  is  presented  sepa¬ 
rately  in  Section  4. 

3.3.  Method  for  Treatment  of  Uncertainty 

Given  the  inherent  nature  of  datasets  that  include  several 
noise  factors  and  lack  of  specific  information  on  the  effects  of 
operational  conditions  it  is  important  for  algorithms  to  model 
and  account  for  uncertainty  in  the  system.  Different  publica¬ 
tions  have  dealt  with  uncertainty  at  various  stages  of  process¬ 
ing  as  described  below: 

1.  Signal  processing  step  such  as  noise  filtering  using  a 
Kalman  filter  as  in  publications  2,  3,  20,  Gaussian  kernel 
smoothing  in  publications  1,  7,  and  functional  principal 
component  analysis  in  publication  15. 

2.  Feature  extraction/selection  step  such  as  using  princi¬ 
pal  component  analysis  and  other  variants  of  it  as  sug¬ 
gested  in  publications  1,  7,  13,  grey-correlation  in  pub¬ 
lication  22,  and  computing  relevance  of  features  for  pre¬ 
diction  in  publication  23. 

3.  Health  estimation  step  such  as  based  on  operating  con¬ 
ditions  assessment  to  normalize/factor  out  the  effects  of 
operating  conditions  as  proposed  in  publications  1,  7,  21, 
40  and  using  non-linear  regression. 

4.  Classification  step  where  uncertainty  modeling  plays  a 
role  on  data  labeling  using  noisy  and  imprecise  degrada¬ 
tion  levels  as  shown  in  publications  12,  27,  33,  or  on  the 
inference  of  a  sequence  of  degradation  levels  such  as  us¬ 
ing  Markov  Models  or  multi -models  as  in  publications  6, 
10,  24,  32,  34. 

5.  Prediction  step  such  as  gradually  incorporating  prior 
knowledge  during  estimation  in  presence  of  noise  as  pro¬ 
posed  in  publications  4,  14,  16,  17,  19,  21,  30,  in  deter¬ 
mining  failure  thresholds  as  in  publications  10,  27,  32  or 
in  representing  health  indicator  such  as  in  publication  40 
to  be  used  in  prediction. 

6.  Information  fusion  step  by  merging  multiple  RUL  esti¬ 
mates  through  Bayesian  updating  as  pointed  in  publica- 

1  http : //members . femto-st . f r/emmanuel-r amass o/ data-and-codes 
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tions  4,  21  or  in  similarity-based  matching  as  in  publica¬ 
tions  1,  27,  40. 

A  variety  of  different  uncertainty  representation  theories  are 
found  to  be  used.  Table  4  classifies  different  publications  ac¬ 
cording  to  the  theory  of  uncertainty  treatment  used  in  corre¬ 
sponding  analysis  (Klir  &  Wierman,  1999).  As  shown  in  the 
table,  the  probability  theory  is  the  most  popular  one  (65%) 
followed  by  set-membership  approaches  (in  particular  fuzzy- 
sets  with  15%),  Dempster-Shafer’s  theory  of  belief  functions 
(13%),  and  other  measures  (such  as  polygon  area  and  Cho- 
quet  integral). 


Table  4.  Methods  for  uncertainty  management  used  on  C- 
MAPSS  datasets. 


Theories 

Publication  ID 

Ratio 

Probability  theory 

1,  2,  3,4,  5,  6,  7,  11,  12,  13,  15,  16, 
17,  19,  20,  21,  22,  26,  28,  29,  30, 
31,32,33,34,  35 

26/40 

Set-membership 

10,  14,  23,  25,  36,  39 

6/40 

Belief  functions 

6,  10,  24,  27,  33 

5/40 

Other  measures 

10,  40 

2/40 

3.4.  Methods  used  for  Performance  Evaluation 

Table  5  summarizes  the  performance  measures  that  have 
been  used  for  prognostics-oriented  publications.  A  taxon¬ 
omy  of  performance  measures  for  RUL  estimation  was  pro¬ 
posed  in  (Saxena,  Celaya,  et  al.,  2008;  Saxena,  Celaya,  Saha, 
Saha,  &  Goebel,  2010),  where  different  categories  were  pre¬ 
sented:  accuracy-based,  precision-based,  robustness-based, 
trajectory-based,  computational  performance  and  cost/benefit 
measures,  as  well  as  some  measures  dedicated  specifically 
to  prognostics  (PHM  metrics).  Since  this  problem  involves 
predictions  on  multiple  units,  it  is  expected  that  the  major¬ 
ity  of  publications  would  use  error-based  accuracy  and  pre¬ 
cision  metrics.  Metric  like  the  Mean  Squared  Error  (MSE) 
has  been  used  in  two  different  ways:  For  the  estimation  of  the 
goodness  of  fit  between  a  predicted  and  a  real  signal,  and  as 
an  accuracy-based  metric  to  aggregate  errors  in  RUL  estima¬ 
tion.  Only  the  publications  that  fall  under  latter  category  are 
included  in  the  table.  The  table  clearly  shows  that  accuracy- 
based  measures  were  most  widely  used,  in  particular  the  scor¬ 
ing  function  from  PHM08  challenge,  which  also  weighs  ac¬ 
curacy  by  timeliness  of  predictions.  Broader  usage  of  this 
metric  is  also  explained  by  the  fact  that  this  is  the  only  met¬ 
ric  for  which  scores  from  data  challenge  were  available  and 
can  be  used  as  benchmark  to  compare  with  any  new  develop¬ 
ment.  However,  one  may  also  compute  additional  measures 
if  using  only  the  training  datasets  where  full  trajectories  are 
available.  In  that  case,  approaches  like  leave-one-out  valida¬ 
tion  become  applicable  where  all  training  instances  but  one 
are  used  for  training  each  time  and  the  remaining  one  is  used 
for  performance  evaluation.  Then  the  average  of  the  perfor¬ 
mance  measure  is  computed  from  all  the  runs.  Publication  27 


presents  this  approach  for  dataset  #1  and  a  cross-validation 
procedure  for  dataset  #5T  is  used  in  publication  21.  Note 
that  publications  19,  20,  32  provide  the  only  RUL  estimates 
for  all  testing  instances  (without  computing  any  metrics)  and 
publications  10,  27  present  distribution  of  errors. 


Table  5.  Performance  measures  used  in  prognostics-oriented 
publications  applied  on  C-MAPSS. 


Categories 

Measures 

Publication  ID 

Ratio 

PHM08  Score 

1,2,  4,5,  8,  16,21,29,  30,  40 

10/40 

FPR,  FNR 

8,  10,  27,  40 

4/40 

Accuracy 

MSE 

3,  8,  15,  17,  29,  40 

6/40 

MAPE 

4,  23,  28,  32,  34,  39,  40 

7/40 

MAE 

5,  13,38,40 

4/40 

Precision 

ME 

25,28,32,39 

4/40 

MAD 

25 

1/40 

PH 

7,  22 

2/40 

a  —  A 

7,  22 

2/40 

Prognostics 

RA 

7,  22,  34 

3/40 

CV 

7,  22,  34 

3/40 

AB 

34 

1/40 

4.  Prognostic  Approaches 

C-MAPSS  datasets  were  generated  to  allow  development  and 
benchmarking  of  various  prognostics  approaches.  However, 
as  observed  from  the  literature  review  (see  Section  3.2)  many 
researchers  have  used  them  to  cast  a  multiclass  classification 
problem  instead,  even  though  majority  of  publications  did  use 
them  to  develop  prognostics  algorithm.  This  section  focuses 
on  describing  those  prognostic  approaches.  These  approaches 
used  on  C-MAPSS  datasets  can  be  divided  into  three  broad 
categories  as  described  next. 

4.1.  Category  1:  Using  functional  mappings  between  set 
of  inputs  and  RUL 

Methods  in  this  category  (see  Table  6)  first  transform  the 
training  data  (trajectories)  into  a  multidimensional  feature 
space  and  use  corresponding  RUL  to  label  corresponding  fea¬ 
ture  vectors.  Then  using  supervised  learning  methods  a  map¬ 
ping  between  feature  vectors  and  RUL  is  developed.  Methods 
within  this  category  are  mostly  based  on  Neural  Networks 
with  various  architectures.  Different  sensor  channels  were 
used  to  generate  corresponding  features.  However,  it  was  ob¬ 
served  that  the  approaches  yielding  good  performance  also 
included  a  feature  selection  step  through  advanced  parameter 
optimization  such  as  using  genetic  algorithm  and  Kalman  fil¬ 
tering  as  described  in  publications  2,  3  that  ranked  2d  and  3rd 
respectively  in  the  competition. 

4.2.  Category  2:  Functional  mapping  between  health  in¬ 
dex  (HI)  and  RUL 

Methods  listed  in  Table  7  are  based  on  the  estimation  of 
two  mapping  functions:  One  maps  sensor  measurements  to 
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Table  6.  Category  1  methods  using  a  mapping  learned  be¬ 
tween  a  subset  of  sensor  measurements  as  inputs  and  RUL  as 
output. 


Methods 

Publication  ID 

RNN,  EKF 

2 

MLP,  RBF,  KF,  Ensemble 

3 

MLP 

8 

ANN 

9 

ESN 

20 

Fuzzy  rules,  genetic  algorithm 

36 

MLP,  adaboost 

38 

Table  8.  Category  2  methods  based  on  individual  sensor  mod¬ 
eling  and  classification. 


Methods 

Publication  ID 

exTS,  supervised  classification 

10 

SVR 

13 

exTS,  ARX 

14 

ANN,  ANFIS 

23 

Piece-wise  linear  (multi-models) 

24 

exTS 

25 

ELM,  unsupervised  classification 

32 

a  health  index  (1-D  variable)  for  each  training  unit  based  on 
sensor  measurements;  The  second  mapping  links  health  in¬ 
dex  values  to  the  RUL.  These  approaches  construct  a  library 
of  degradation  models.  Inference  of  the  RUL  for  a  given  test 
instance  includes  using  the  library  as  prior  knowledge  to  up¬ 
date  the  parameters  of  the  model  corresponding  to  the  new 
test  instance.  Updating  can  be  done  using  Bayes  rule  as  pro¬ 
posed  in  publication  4  or  other  model  averaging  or  ensemble 
techniques  designed  to  take  into  account  the  uncertainty  in¬ 
herent  to  the  model  selection  process  (Raftery,  Gneiting,  Bal- 
abdaoui,  &  Polakowski,  2003). 


Table  7.  Type  2  methods  using  health  index  as  input  and  RUL 
as  output. 


Methods 

Publication  ID 

Quadratic  fit,  Bayesian  updating 

4 

Logistic  regression 

5 

Kernel  regression,  RVM 

7 

RVM 

16 

Gamma  process 

17 

Linear,  Bayesian  updating 

19 

RVM,  SVM,  RNN,  Exponential  and  quadratic  fit, 
Bayesian  updating 

21 

Exponential  fit 

28 

Wiener  process 

29 

Copula 

30 

HMM,  LS-SVR 

34 

Table  8  lists  some  other  approaches  that  use  approximation 
functions  to  represent  the  evolution  of  individual  sensor  mea¬ 
surement  through  time.  Given  a  test  instance  as  many  predic¬ 
tions  are  made  as  the  number  of  sensors.  These  predictions 
are  then  used  in  a  classifier  that  assigns  a  class  label  related 
to  identified  degradation  level.  Some  of  these  approaches 
also  update  classifier  parameters  with  new  measurements  us¬ 
ing  some  Bayesian  updating  rules  as  mentioned  previously. 
These  methods  were  however  applied  only  on  dataset  #1  in 
which  sensors  depict  clear  monotonic  trends. 

4.3.  Category  3:  Similarity-based  matching 

In  these  methods  (Table  9),  historical  instances  of  the  system 
(sensor  measurements  trajectories  labeled  with  known  failure 
times)  are  used  to  create  a  library.  For  a  given  test  instance 


similarity  with  instances  in  the  library  is  evaluated  generating 
a  set  of  Remaining  Useful  Life  (RUL)  estimates  that  are  even¬ 
tually  aggregated  using  different  methods.  Compared  to  cat¬ 
egory  2  methods,  these  methods  do  not  make  use  of  training 
trajectory  abstraction  into  features,  but  trajectory  data  (possi¬ 
bly  filtered)  are  themselves  stored.  Similarity  is  computed  in 
the  sensor  space  as  in  publication  27  or  using  health  indices 
as  in  publications  1,7,  17,  21,  40. 

As  mentioned  in  publications  1,  7,  in  practice,  the  test  in¬ 
stance  and  the  training  instance  may  take  different  time  in 
reaching  a  particular  degradation  level  from  the  initial  healthy 
state.  Therefore,  similarity-based  matching  must  accommo¬ 
date  this  difference  in  the  early  phases  of  degradation  curves. 
In  publication  40,  this  problem  was  tackled  by  assuming  a 
constant  initial  wear  for  all  instances  yielding  an  offset  on 
health  indices.  Efficient  similarity  measures  are  also  neces¬ 
sary  to  cope  with  noise  and  degradation  paths.  For  instance, 
in  publications  1,  7  three  different  similarity  measures  were 
used,  and  in  publication  40,  computational  geometry  tools 
were  used  for  instance  representation  and  similarity  evalua¬ 
tion. 

Table  9.  Category  3  methods  using  similarity-based  match¬ 
ing. 


Methods  Publication  ID 

Hi-based  3  similarity  measures  and  kernel  smoothing  1 . 7 

Similar  to  1  and  7  using  1  similarity  measure  22 

Feature-based  similarity,  1  similarity  measure,  en-  27 

semble,  degradation  levels  classification 

Hi-based  similarity,  polygon  coverage  similarity,  en-  40 

semble 


An  advantage  of  approaches  in  this  category  is  that  new  in¬ 
stances  can  be  easily  incorporated.  Moreover,  similarity- 
based  matching  approaches  have  demonstrated  good  general¬ 
ization  capability  on  all  C-MAPSS  datasets  as  shown  in  pub¬ 
lications  1,  7,  40  despite  a  high  level  of  noise,  multiple  simul¬ 
taneous  fault  modes,  and  a  number  of  operating  conditions. 
This  category  of  algorithms  are  relatively  easily  parallelized 
to  reduce  computational  times  needed  for  inference. 
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5.  Some  Guidelines  to  Using  C-MAPSS  Datasets 

Another  contribution  from  this  paper  is  through  summariz¬ 
ing  some  guidelines  in  using  C-MAPSS  datasets  that  my  help 
future  users  to  understand  and  utilize  these  datasets  better. 
It  summarizes  information  gathered  from  the  literature  re¬ 
view  and  authors’  own  experiences,  which  in  many  cases  goes 
beyond  the  documentation  provided  along  with  the  datasets. 
Specifically,  it  offers  some  general  processing  steps  and  lists 
relevant  publications  that  describe  implementation  of  these 
preprocessing  steps  that  could  be  useful  in  developing  a  prog¬ 
nostic  algorithm  (Figure  1). 


Understanding 
C-MAPSS  Data 
and  dataset 
selection 

1 

Defining  the 
Problem 


Turbofan  Dataset  from 
NASA  (#1.  #2.  #3,  #4) 
PHM08  Challenge  Dataset 
(#5T,  #5  V) 


-C 


Multiclass  classification 
Prognostics 


1 

Data  Preparation 


Learning  and 
Predicting 

1 

Performance 

Evaluation 


-C 

-5 


Create  Train,  Test,  Validation  sets 

Sensor  selection 
Feature  extraction 
Noise  filtering 

Neural  Network -based  methods 
Extrapolation  -based  methods 
Similarity -based  methods 


Choice  of  metrics 
Comparison  with  benchmarks 
Evaluation  on  challenge 
validation  set  by  NASA 


Figure  1.  Guidelines  to  Using  C-MAPSS  Datasets. 


Based  on  the  analysis  presented  in  (Section  3),  five  general 
data  processing  and  algorithmic  steps  are  considered: 

[Step  1:]  Understanding  C-MAPSS  datasets  -  Compre¬ 
hensive  background  information  on  turbofan  engines  and 
C-MAPSS  datasets  is  well  presented  in  three  publications, 
(Saxena,  Goebel,  Simon,  &  Eklund,  2008b),  (Richter,  2012), 
and  (T.  Wang,  2010).  More  details  about  the  hierarchical 
decomposition  of  the  simulated  system  into  critical  compo¬ 
nents  can  also  be  found  in  (Frederick  et  ah,  2007;  Abbas, 
2010),  which  provides  valuable  domain  knowledge.  These 
publications  do  not  focus  on  the  physics-of-failure  of  tur¬ 
bofan  engines  but  describe  generation  of  these  datasets  and 
various  practical  aspects  when  using  C-MAPSS  datasets  for 
prognostics.  These  include  description  of  sensors  measure¬ 
ments,  illustrations  of  operating  conditions,  impact  of  fault 
modes,  etc.,  which  can  play  an  important  role  in  improv¬ 
ing  data-driven  prognostics  algorithms  as  well.  Going  from 
dataset  #1  to  #4  represents  varying  degrees  of  complexity 
and,  therefore,  it  is  recommended  to  use  them  in  that  order  to 
incrementally  develop  methods  to  accommodating  individual 
complexity  one  by  one.  The  challenge  datasets  fall  some¬ 
where  in  the  middle  as  far  as  complexity  level  goes  but  suffer 
from  availability  of  ground  truth  information  for  a  quicker 


feedback  during  algorithm  development.  Therefore,  these 
datasets  may  be  used  as  validation  examples  and  should  be 
compared  to  other  approaches  using  benchmarks  presented 
in  Section  2.2. 

[Step  2:]  Defining  the  problem  -  Given  the  nature  of  these 
datasets  several  types  of  problems  can  be  defined.  As  men¬ 
tioned  in  Section  3.2  in  addition  to  prediction,  a  multi-class 
classification  problem  can  be  defined  for  a  multidimensional 
feature  space.  However,  the  intent  behind  these  data  was 
to  promote  prognostics  algorithm  development.  Since  these 
data  consist  of  multiple  trajectories,  the  problem  to  predict 
the  RUL  for  all  trajectories  can  be  constructed  just  as  the  one 
posed  in  the  data  challenge.  However,  one  could  also  define 
the  problem  at  a  higher  granularity  by  modeling  the  degrada¬ 
tion  for  each  trajectory  individually  and  predict  RUL  at  multi¬ 
ple  time  instances,  which  would  be  more  of  a  condition  based 
prognostics  context. 

[Step  3:]  Data  preparation  -  After  a  dataset  (turbofan  or 
data  challenge)  is  selected,  it  is  suggested  to  split  the  original 
training  dataset  into  two  subsets:  a  training  dataset  for  model 
parameter  estimation  (learning)  and  a  testing  dataset  to  test 
the  learned  model  7  (see  for  example  publications  21,  40). 
For  the  datasets  #1  —  4  corresponding  RUL  vectors  are  pro¬ 
vided  for  the  test  sets  so  users  can  validate  their  algorithms. 
However,  for  the  challenge  datasets,  the  evaluations  can  only 
be  obtained  by  uploading  the  RUL  to  the  data  repository  web¬ 
site.  Therefore,  it  may  be  desirable  to  split  the  training  set 
itself  for  training,  test,  and  validation  purposes  during  algo¬ 
rithm  development.  The  next  step  is  to  downselect  sensors  to 
reduce  problem  dimensionality.  Some  data  exploration  and 
preparation  approaches  for  the  data  challenge  (datasets  #5T 
and  #5U)  are  well  described  in  publications  1,  2  and  7.  Some 
“heuristic  rules”  to  avoid  over-predictions  are  also  presented 
in  publication  40  and  applied  on  all  five  C-MAPSS  datasets. 
Some  of  the  better  performing  methods  are  based  on  a  PCA 
such  as  in  publication  1,  and  other  sensor  selection  proce¬ 
dures  such  as  in  publications  2,  3  and  40.  From  the  survey  it 
was  noted  that  the  most  commonly  selected  subset  of  sensors 
was  7,  8,  9, 12, 16, 17,  20  (as  it  was  also  initially  suggested  in 
publication  1).  Additional  sensors  may  also  be  considered, 
similar  to  the  approach  proposed  in  publication  40  where  a 
total  of  511  combinations  were  studied  for  each  dataset  for 
an  exhaustive  evaluation. 

[Step  4:]  Learning  and  Predicting  -  This  step  forms  the 
core  of  the  prediction  problem.  As  described  in  Section  3  a 
variety  of  learning  approaches  can  be  employed  to  learn  var¬ 
ious  mappings  between  the  sensor  data  and  system  health  to 
compute  RUL.  Some  of  these  methods  try  to  learn  RUL  as 
a  function  of  sensor  data  (system  state)  or  features  thereof, 
others  estimate  a  health  index  first.  Each  of  the  trajectory  can 
be  modeled  into  a  degradation  process  to  predict  when  they 
cross  the  zero  health  threshold  using  regression  methods.  Ap- 
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proaches  based  on  health  index  computation  can  be  applied 
to  all  datasets.  The  approach  proposed  in  publications  1,  7 
is  the  simplest  to  implement.  To  deal  with  normalization  (or 
alternatively  segmentation)  of  data  by  operating  conditions 
one  could  use  a  clustering  approach  as  suggested  by  the  au¬ 
thors  above,  or  one  may  directly  use  the  parameters  described 
in  publication  18  to  validate  the  performance  of  segmenta¬ 
tion.  Some  variants  for  health  indicator  estimation  can  also 
be  picked  from  publications  21  and  40. 

[Step  5:]  Performance  evaluation  -  Once  a  learned  model 
results  in  to  satisfactory  results  on  the  testing  set  aside  by 
partitioning  the  training  data,  one  may  use  the  actual  test 
dataset  provided  with  the  datasets.  After  further  tuning,  es¬ 
pecially  for  datasets  (#5T  and  #5V),  a  final  validation  can 
be  done  by  submitting  the  results  to  the  NASA  repository. 
Before  uploading  the  final  submission,  the  generalization  ca¬ 
pability  should  be  ensured  by  computing  using  several  perfor¬ 
mance  metrics  as  discussed  in  Section  2.2.  Some  benchmarks 
have  been  provided  in  Section  2.2  using  metrics  that  aggre¬ 
gate  prediction  performance  from  multiple  units.  While  the 
exact  numbers  would  not  match,  the  performance  is  expected 
to  be  in  the  similar  range  for  results  obtained  from  turbofan 
datasets  that  have  access  to  RUL.  For  comparison  purposes, 
the  scores  obtained  in  previous  works  on  complete  C-MAPSS 
trajectories  are  summarized  in  publication  40.  Note  that  here 
using  the  full  trajectory  data  it  is  possible  to  compute  prog¬ 
nostics  metrics  as  presented  in  (Saxena,  Celaya,  et  ah,  2008; 
Saxena  et  ah,  2010)  as  the  actual  EOL  is  known  apriori.  This 
allows  testing  the  critical  time  aspect  of  a  prediction  in  addi¬ 
tion  to  accuracy  and  precision  measures. 

6.  Conclusion 

As  observed  from  published  PHM  literature  the  most  widely 
used  datasets  for  data-driven  prognostics  come  from  the  C- 
MAPSS  turbofan  simulator  from  among  the  other  openly 
available  prognostic  datasets.  Guided  by  this  observation,  a 
survey  of  approaches  developed  using  these  datasets  (since 
2008)  was  carried  out  with  the  purpose  of  understanding  the 
current  state-of-the-art  and  assess  how  these  datasets  have 
helped  in  development  of  prognostic  algorithms.  However, 
it  was  noticed  that  due  to  several  factors,  these  datasets  did 
not  get  used  as  intended  and  any  meaningful  comparison  be¬ 
tween  approaches  was  not  trivial.  Specifically  following  ob¬ 
servations  were  made  and  this  paper  tries  to  alleviate  some  of 
these  factors  to  improve  usage  of  these  datasets  as  originally 
intended. 

•  Despite  several  thousand  downloads  only  70  papers  re¬ 
ferring  to  C-MAPSS  were  found  in  the  published  liter¬ 
ature.  This  suggests  that  a  vast  majority  of  those  who 
downloaded  did  not  get  to  utilize  these  data  to  the  point 
of  publishing  the  results  in  a  publication.  Therefore, 
some  guidance  has  been  provided  to  help  in  understand¬ 
ing  these  datasets  and  how  a  prognostics  problem  may 


be  set  up  in  few  different  ways.  Furthermore,  a  descrip¬ 
tion  of  all  five  C-MAPSS  datasets  is  provided  identifying 
their  distinguishing  characteristics  and  clearing  up  some 
misunderstandings  as  identified  from  the  survey. 

•  Among  the  70  papers,  only  a  few  actually  used  the  test¬ 
ing  datasets  for  evaluating  their  methods.  A  mix  of  dif¬ 
ferent  datasets  and  the  metrics  used  to  evaluate  perfor¬ 
mance  was  observed  from  the  survey.  This  made  it  diffi¬ 
cult  to  compare  performance  between  different  reported 
methods  in  a  consistent  manner.  Therefore,  a  better  ex¬ 
planation  of  differences  in  these  datasets  and  providing 
the  top  thirty  scores  from  challenge  datasets  should  help 
future  users  in  comparing  their  methods  against  a  bench¬ 
mark  in  a  more  consistent  manner.  Furthermore,  it  is  also 
suggested  how  results  from  datasets  that  are  not  from  the 
challenge  could  be  compared  against  this  benchmark  es¬ 
tablished  on  the  challenge  set. 

•  The  survey  reveals  usage  of  various  prognostics  ap¬ 
proaches  that  can  be  divided  into  three  main  categories. 
These  approaches  are  briefly  described  with  potential  ar¬ 
eas  for  further  improvement.  The  survey  also  demon¬ 
strated  that  C-MAPSS  datasets  can  be  used  for  devel¬ 
oping  and  testing  methods  for  several  intermediate  steps 
in  prognostics  such  as  sensor  selection,  health  indicator 
estimation,  operating  conditions  modeling  in  addition  to 
fault  estimation  and  prediction. 

With  the  analysis  presented  in  this  paper  and  references  to  a 
variety  of  approaches  employed,  this  paper  hopes  to  establish 
public  knowledge  that  can  be  used  by  future  users  in  prognos¬ 
tic  algorithm  development  and  aid  in  fulfilling  the  underlying 
intent  of  data  repository  to  facilitate  algorithm  benchmarking 
and  further  development.  The  issue  of  performance  bench¬ 
marking  remains  to  be  explored  as  part  of  future  work  where 
authors  plan  to  compute  performance  for  challenge  entries 
based  on  several  other  metrics  that  will  allow  comparisons 
with  performance  results  reported  in  many  publications. 

Nomenclature 


PHM 

Prognostics  and  Health  Management 

RUL 

Remaining  Useful  Life 

CMAPSS 

Commercial  Modular  Aero-Propulsion 
System  Simulation 

HI 

Health  index 

MLP 

MultiLayer  Perceptron 

ANN 

Artificial  neural  network 

RNN 

Recurrent  neural  network 

RBF 

Radial  basis  function 

ESN 

Echo  state  network 

ELM 

Extreme  learning  machine 

EKF 

Extended  Kalman  filter 
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KF 

Kalman  filter 

SVR 

Support  vector  regression 

LS-SVR 

Least  squared  support  vector  regression 

exTS 

Evolving  extended  Takagi-Sugeno  system 

ARX 

Autoregressive  exogeneous  model 

ANFIS 

Adaptive  neuro  fuzzy  inference  system 

RVM 

Relevance  vector  machine 

HMM 

Hidden  Markov  model 

PCA 

Principal  components  analysis 

MSE 

Mean  squared  error 

MAPE 

Mean  absolute  percentage  error 

MAE 

Mean  absolute  error 

ME 

Mean  error 

PH 

Prediction  horizon 

AP 

Acceptable  predictions  (rate) 

a  —  A 

Accuracy  at  specific  times 

RA 

Relative  accuracy 

CV 

Convergence 

AB 

Average  bias 

FPR 

False  positive  rate 

FNR 

False  negative  rate 
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Appendix 

All  references  were  mapped  to  numeric  identifiers  to  be  used 
in  survey  and  analysis  results  for  better  readability.  This  map¬ 
ping  is  provided  in  the  Table  10  below. 


Table  10.  References  to  ID  mapping. 
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